{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright (c) MONAI Consortium  \n",
    "Licensed under the Apache License, Version 2.0 (the \"License\");  \n",
    "you may not use this file except in compliance with the License.  \n",
    "You may obtain a copy of the License at  \n",
    "&nbsp;&nbsp;&nbsp;&nbsp;http://www.apache.org/licenses/LICENSE-2.0  \n",
    "Unless required by applicable law or agreed to in writing, software  \n",
    "distributed under the License is distributed on an \"AS IS\" BASIS,  \n",
    "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  \n",
    "See the License for the specific language governing permissions and  \n",
    "limitations under the License. \n",
    "\n",
    "# Auto3DSeg Data Analyzer\n",
    "\n",
    "Data Analyzer is one of the MONAI Auto3DSeg modules. This module provides a comprehensive analysis report using DataAnalyzer class. In this notebook, we will provide a tutorial on how to use the DataAnalyzer class on simulated and real-world datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup environment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!python -c \"import monai\" || pip install -q \"monai-weekly[nibabel, nni, tqdm, cucim, yaml, fire]\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import nibabel as nib\n",
    "import numpy as np\n",
    "import tempfile\n",
    "\n",
    "from monai.apps import download_and_extract\n",
    "from monai.apps.auto3dseg import DataAnalyzer\n",
    "from monai.config import print_config\n",
    "from monai.data import create_test_image_3d\n",
    "\n",
    "print_config()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Simulate a dataset and Auto3D datalist using MONAI functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sim_datalist = {\n",
    "    \"testing\": [\n",
    "        {\"image\": \"val_001.fake.nii.gz\"},\n",
    "        {\"image\": \"val_002.fake.nii.gz\"},\n",
    "        {\"image\": \"val_003.fake.nii.gz\"},\n",
    "        {\"image\": \"val_004.fake.nii.gz\"},\n",
    "        {\"image\": \"val_005.fake.nii.gz\"},\n",
    "    ],\n",
    "    \"training\": [\n",
    "        {\"fold\": 0, \"image\": \"tr_image_001.fake.nii.gz\", \"label\": \"tr_label_001.fake.nii.gz\"},\n",
    "        {\"fold\": 0, \"image\": \"tr_image_002.fake.nii.gz\", \"label\": \"tr_label_002.fake.nii.gz\"},\n",
    "        {\"fold\": 0, \"image\": \"tr_image_003.fake.nii.gz\", \"label\": \"tr_label_003.fake.nii.gz\"},\n",
    "        {\"fold\": 0, \"image\": \"tr_image_004.fake.nii.gz\", \"label\": \"tr_label_004.fake.nii.gz\"},\n",
    "        {\"fold\": 0, \"image\": \"tr_image_005.fake.nii.gz\", \"label\": \"tr_label_005.fake.nii.gz\"},\n",
    "        {\"fold\": 0, \"image\": \"tr_image_006.fake.nii.gz\", \"label\": \"tr_label_006.fake.nii.gz\"},\n",
    "        {\"fold\": 0, \"image\": \"tr_image_007.fake.nii.gz\", \"label\": \"tr_label_007.fake.nii.gz\"},\n",
    "        {\"fold\": 0, \"image\": \"tr_image_008.fake.nii.gz\", \"label\": \"tr_label_008.fake.nii.gz\"},\n",
    "        {\"fold\": 0, \"image\": \"tr_image_009.fake.nii.gz\", \"label\": \"tr_label_009.fake.nii.gz\"},\n",
    "        {\"fold\": 0, \"image\": \"tr_image_010.fake.nii.gz\", \"label\": \"tr_label_010.fake.nii.gz\"},\n",
    "        {\"fold\": 1, \"image\": \"tr_image_006.fake.nii.gz\", \"label\": \"tr_label_006.fake.nii.gz\"},\n",
    "        {\"fold\": 1, \"image\": \"tr_image_007.fake.nii.gz\", \"label\": \"tr_label_007.fake.nii.gz\"},\n",
    "        {\"fold\": 1, \"image\": \"tr_image_008.fake.nii.gz\", \"label\": \"tr_label_008.fake.nii.gz\"},\n",
    "        {\"fold\": 1, \"image\": \"tr_image_009.fake.nii.gz\", \"label\": \"tr_label_009.fake.nii.gz\"},\n",
    "        {\"fold\": 1, \"image\": \"tr_image_010.fake.nii.gz\", \"label\": \"tr_label_010.fake.nii.gz\"},\n",
    "        {\"fold\": 1, \"image\": \"tr_image_011.fake.nii.gz\", \"label\": \"tr_label_011.fake.nii.gz\"},\n",
    "        {\"fold\": 1, \"image\": \"tr_image_012.fake.nii.gz\", \"label\": \"tr_label_012.fake.nii.gz\"},\n",
    "        {\"fold\": 1, \"image\": \"tr_image_013.fake.nii.gz\", \"label\": \"tr_label_013.fake.nii.gz\"},\n",
    "        {\"fold\": 1, \"image\": \"tr_image_014.fake.nii.gz\", \"label\": \"tr_label_014.fake.nii.gz\"},\n",
    "        {\"fold\": 1, \"image\": \"tr_image_015.fake.nii.gz\", \"label\": \"tr_label_015.fake.nii.gz\"},\n",
    "    ],\n",
    "}\n",
    "\n",
    "\n",
    "def simulate():\n",
    "    test_dir = tempfile.TemporaryDirectory()\n",
    "    dataroot = test_dir.name\n",
    "\n",
    "    # Generate a fake dataset\n",
    "    for d in sim_datalist[\"testing\"] + sim_datalist[\"training\"]:\n",
    "        im, seg = create_test_image_3d(39, 47, 46, rad_max=10)\n",
    "        nib_image = nib.Nifti1Image(im, affine=np.eye(4))\n",
    "        image_fpath = os.path.join(dataroot, d[\"image\"])\n",
    "        nib.save(nib_image, image_fpath)\n",
    "\n",
    "        if \"label\" in d:\n",
    "            nib_image = nib.Nifti1Image(seg, affine=np.eye(4))\n",
    "            label_fpath = os.path.join(dataroot, d[\"label\"])\n",
    "            nib.save(nib_image, label_fpath)\n",
    "\n",
    "    return dataroot, test_dir\n",
    "\n",
    "\n",
    "sim_dataroot, test_dir = simulate()\n",
    "print(\"data are generated and saved in this directory: \", sim_dataroot)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run the DataAnalyzer on simulated datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "analyser = DataAnalyzer(sim_datalist, sim_dataroot)\n",
    "datastat = analyser.get_all_case_stats()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you would like to inspect the data stats, please check the `datastats.yaml` and `datastats_by_case.yaml` under the directory\n",
    "\n",
    "Next, we will perform the data analysis on a real-world dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download dataset\n",
    "\n",
    "First we need to set up data directory and download data. Here we can specify a directory with the `MONAI_DATA_DIRECTORY` environment variable to save downloaded dataset and outputs. The dataset comes from http://medicaldecathlon.com/."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "directory = os.environ.get(\"MONAI_DATA_DIRECTORY\")\n",
    "if directory is not None:\n",
    "    os.makedirs(directory, exist_ok=True)\n",
    "root_dir = tempfile.mkdtemp() if directory is None else directory\n",
    "print(root_dir)\n",
    "\n",
    "msd_task = \"Task04_Hippocampus\"\n",
    "resource = \"https://msd-for-monai.s3-us-west-2.amazonaws.com/\" + msd_task + \".tar\"\n",
    "\n",
    "compressed_file = os.path.join(root_dir, msd_task + \".tar\")\n",
    "dataroot = os.path.join(root_dir, msd_task)\n",
    "if not os.path.exists(dataroot):\n",
    "    download_and_extract(resource, compressed_file, root_dir)\n",
    "\n",
    "datalist_file = os.path.join(\"..\", \"tasks\", \"msd\", msd_task, \"msd_\" + msd_task.lower() + \"_folds.json\")\n",
    "print(\"dataroot path:\", os.path.abspath(dataroot))\n",
    "print(\"datalist file path: \", os.path.abspath(datalist_file))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "analyser = DataAnalyzer(datalist_file, dataroot)\n",
    "datastat = analyser.get_all_case_stats()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Run the data analyzer in shell (via Python Fire)\n",
    "\n",
    "The users can run the data analyzer in shell script. With the downloaded path and the datalist, one can run the following command in the terminal:\n",
    "\n",
    "```bash\n",
    "python -m monai.apps.auto3dseg DataAnalyzer get_all_case_stats \\\n",
    "            --datalist=\"<datalist file path>\" \\\n",
    "            --dataroot=\"<dataroot path>\" > /dev/null\n",
    "```\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  },
  "vscode": {
   "interpreter": {
    "hash": "d4d1e4263499bec80672ea0156c357c1ee493ec2b1c70f0acce89fc37c4a6abe"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
